1b), and if it’s clear that the fraction of positive outcomes isn’t leveling off at
or
for
very large or very small X values, then logistic regression is not the correct modeling approach.
The H-L test described earlier under the section “Assessing the adequacy of the model” provides a
statistical test to determine if your data qualify for logistic regression. Also, in Chapter 19, we
describe a more generalized logistic model that contains other parameters for the upper and lower
leveling-off values.
Watch out for collinearity and disappearing significance: When you are doing any kind of
regression and two or more predictor variables are strongly related with each other, you can be
plagued with problems of collinearity. We describe this problem in Chapter 17, and potential
modeling solutions in Chapter 20.
Check for inadvertent reverse-coding of the outcome variable: The outcome variable should
always be coded as 1 for a yes outcome and 0 for a no outcome (refer to Table 18-1 for an
example). If the variable in the data set is coded using characters, you should recode an outcome
variable using the 0/1 coding. It is important you do the coding yourself, and do not leave it to an
automated function in the program, because it may inadvertently reverse the coding so that 1 = no
and 0 = yes. This error of reversal won’t affect any p values, but it will cause all your ORs and
their CIs to be the reciprocals of what they would have been, meaning they will refer to the odds of
no rather than the odds of yes.
Don’t misinterpret odds ratios for categorical predicators: Categorical predictors should be
coded numerically as we describe in Chapter 8. It is important to ensure that proper indicator
variable coding is used, and these variables are introduced properly in the model, as described in
Chapter 17.
Also, be careful not to misinterpret odds ratios for numerical predictors, and be mindful of the
complete separation problem, as described in the following sections.
Don’t misinterpret odds ratios for numerical predictors
The OR always represents the factor by which the odds of getting the outcome event increases
when the predictor increases by exactly one unit of measure, whatever that unit may be.
Sometimes you may want to express the OR in more convenient units than what the data was
recorded in. For the example in Table 18-1, the OR for dose as a predictor of death is 1.0115 per
REM. This isn’t too meaningful because one REM is a very small increment of radiation. By
raising 1.0115 to the 100th power, you get the equivalent OR of 3.1375 per 100 REMs, and you
can express this as, “Every additional 100 REMs of radiation more than triples the odds of
dying.”
The value of a regression coefficient depends on the units in which the corresponding predictor
variable is expressed. So the coefficient of a height variable expressed in meters is 100 times larger
than the coefficient of height expressed in centimeters. In logistic regression, ORs are obtained by
exponentiating the coefficients, so switching from centimeters to meters corresponds to raising the OR
(and its confidence limits) to the 100th power.